feat: Add singleton record fetch and MCP tool for declarative sources #872

aaronsteers · 2025-11-20T19:46:12Z

feat: Add singleton record fetch and MCP tool for declarative sources

Summary

Implements Source.get_record(stream_name, pk_value) and DeclarativeExecutor.fetch_record() to enable fetching single records by primary key from declarative (YAML-based) sources, plus a new get_source_record MCP tool to expose this functionality to AI agents.

Key features:

Public API: source.get_record(stream_name, pk_value="123")
Accepts string, int, or dict formats for pk_value
Reuses existing CDK components (SimpleRetriever, HttpClient, RecordSelector)
Validates primary keys and rejects composite keys with NotImplementedError
New AirbyteRecordNotFoundError exception for missing records
Scanning fallback option when direct fetch fails
MCP tool get_source_record for AI agent integration
Comprehensive unit tests (14 tests, all passing)

Implementation approach:

Uses already-instantiated streams from ConcurrentDeclarativeSource.streams()
Constructs HTTP GET requests by appending /{pk_value} to stream's base path
Accesses CDK private methods (_request_headers, etc.) with noqa markers
Only supports SimpleRetriever-based streams currently
Added catalog_provider property to Source class for reusable CatalogProvider access

Review & Testing Checklist for Human

⚠️ CRITICAL - This PR has NOT been tested with real connectors, only mocked unit tests

Test with real declarative source (e.g., source-pokeapi): Verify source.get_record("pokemon", pk_value="1") works end-to-end
Test MCP tool integration: Verify get_source_record MCP tool works correctly with AI agents/MCP clients
Verify path construction works for your target APIs - the implementation assumes /{pk_value} appended to base path, which may not work for all REST API patterns
Check error handling: Test with non-existent records, authentication failures, rate limiting, malformed responses
Review CDK private API usage: Lines using retriever._request_headers() etc. are marked with noqa but could break in future CDK versions

Test Plan Recommendation

import airbyte as ab

# Test with a real declarative source
source = ab.get_source("source-pokeapi", config={})
source.select_streams(["pokemon"])

# Test basic fetch
record = source.get_record("pokemon", pk_value="1")
assert record["name"] == "bulbasaur"

# Test error cases
try:
    source.get_record("pokemon", pk_value="999999")
except ab.exceptions.AirbyteRecordNotFoundError:
    print("✓ Correctly raises RecordNotFoundError")

# Test different PK formats
source.get_record("pokemon", pk_value=1)  # int
source.get_record("pokemon", pk_value={"id": "1"})  # dict

# Test MCP tool
from airbyte.mcp.local_ops import get_source_record
result = get_source_record(
    source_connector_name="source-pokeapi",
    stream_name="pokemon",
    pk_value="1",
    config={},
)
assert isinstance(result, dict)

Notes

Composite primary keys are not supported - raises NotImplementedError (may want to add support later)
Only works with declarative sources - raises NotImplementedError for Python/Docker sources (unless allow_scanning=True)
Path construction is hardcoded - assumes REST pattern of base_path/{pk_value}, may need customization for different API patterns
Type casting uses type: ignore - The dict(first_record.data) conversion (lines 295-297 in declarative.py) could cause runtime errors with actual record structures
All linter checks (ruff, pyrefly) and unit tests pass locally
CI failures are only the flaky PokeAPI test (unrelated to changes)

Requested by: AJ Steers (@aaronsteers, [email protected])
Devin session: https://app.devin.ai/sessions/9b1bcd7c48bf4d259675a0033564d3fb

Summary by CodeRabbit

New Features
- Fetch individual records directly from declarative sources by primary key.
- CLI/local operation to retrieve a single record from a connector by primary key.
Improvements
- Public source API to retrieve a record with primary-key normalization and clearer errors for missing, composite, or invalid PKs.
- Executor delegation for direct fetches with an optional timed scan fallback and clearer not-found reporting.
Tests
- Added unit tests covering record retrieval, PK validation, error cases, and executor delegation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Implements Source.get_record() and DeclarativeExecutor.fetch_record() methods to enable fetching single records by primary key value from declarative sources. Key features: - Source.get_record(stream_name, pk_value) - Public API for fetching records - DeclarativeExecutor.fetch_record() - Internal implementation using CDK components - Primary key validation and normalization (supports string, int, dict formats) - Composite primary key detection (raises NotImplementedError) - New AirbyteRecordNotFoundError exception for missing records - Comprehensive unit tests with proper mocking This implementation reuses existing CDK components (SimpleRetriever, HttpClient, RecordSelector) without monkey-patching or pinning CDK versions, providing a hybrid approach that works with the current CDK release. Related to CDK PR airbytehq/airbyte-python-cdk#846 Co-Authored-By: AJ Steers <[email protected]>

devin-ai-integration · 2025-11-20T19:46:17Z

Original prompt from AJ Steers

Received message in Slack channel #ask-devin-ai:

@Devin - Find my PR to the Python CDK related to adding singleton record fetches. Then find my related PyAirbyte PR that proposes to build on top of the first. I want you to create a new PyAirbyte PR that ports or monkey-patches the functionality into PyAirbyte, bypassing the need to update the CDK at all.
Thread URL: https://airbytehq-team.slack.com/archives/C08BHPUMEPJ/p1763667021473209

devin-ai-integration · 2025-11-20T19:46:18Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

github-actions · 2025-11-20T19:46:37Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1763667280-add-get-record-without-cdk-dependency' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1763667280-add-get-record-without-cdk-dependency'

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/fix-pr - Fixes most formatting and linting issues
/poetry-lock - Updates poetry.lock file
/test-pr - Runs tests with the updated PyAirbyte

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

airbyte/_executors/declarative.py

Co-Authored-By: AJ Steers <[email protected]>

github-actions · 2025-11-20T19:48:16Z

PyTest Results (Fast Tests Only, No Creds)

362 tests +42 362 ✅ +42 5m 58s ⏱️ -1s
1 suites ± 0 0 💤 ± 0
1 files ± 0 0 ❌ ± 0

Results for commit ba85aaf. ± Comparison against base commit 2981b3d.

♻️ This comment has been updated with latest results.

coderabbitai · 2025-11-20T19:51:00Z

📝 Walkthrough

Walkthrough

Adds single-record retrieval by primary key: new DeclarativeExecutor.fetch_record, new AirbyteRecordNotFoundError exception, Source helpers and get_record with declarative delegation and optional scanning fallback, a CLI/local helper, and unit tests covering these behaviors.

Changes

Cohort / File(s)	Summary
Exceptions `airbyte/exceptions.py`	Added `AirbyteRecordNotFoundError(AirbyteConnectorError)` dataclass with optional `stream_name` and `primary_key_value` to represent missing-record errors.
Declarative executor `airbyte/_executors/declarative.py`	Added `DeclarativeExecutor.fetch_record(stream_name: str, primary_key_value: str) -> dict[str, Any]`: validates and unwraps target stream, enforces `AbstractStream` and `SimpleRetriever`, builds fetch path, issues request via retriever.requester (using headers/params/body helpers and an empty `StreamSlice`), interprets response, derives record schema via `schema_loader` when present, selects records via `record_selector` (or JSON-dict fallback), raises `AirbyteRecordNotFoundError` on no response/no records, and returns the first record (unwrapping `.data` if present).
Source base `airbyte/sources/base.py`	Added `catalog_provider` property, `_normalize_and_validate_pk_value(stream_name, pk_value) -> str`, and `get_record(stream_name, *, pk_value, allow_scanning=False, scan_timeout_seconds=5) -> dict[str, Any]`: resolve/validate stream PK (error on missing or composite PKs), normalize PK input forms (str/int/dict with single key), delegate to declarative executor when applicable, and optionally fall back to scanning with timeout.
CLI / local ops `airbyte/mcp/local_ops.py`	Added `get_source_record(...)` helper to fetch a single record from a connector by primary key, invoking `Source.get_record` and returning either the record or an error string with traceback; appears added in two identical places.
Tests `tests/unit_tests/test_get_record.py`	New unit tests for `_get_stream_primary_key`/`_normalize_and_validate_pk_value`, `Source.get_record` flows (delegation, not-implemented non-declarative, scanning fallback), and `DeclarativeExecutor.fetch_record` validations and not-found behavior.

Sequence Diagram

sequenceDiagram
    participant User as User
    participant Source as Source
    participant Exec as DeclarativeExecutor
    participant Retriever as SimpleRetriever
    participant HTTP as HTTPRequester

    User->>Source: get_record(stream_name, pk_value)
    activate Source
    Source->>Source: catalog_provider / normalize PK
    Source->>Exec: fetch_record(stream_name, primary_key_value)
    activate Exec

    Exec->>Exec: resolve & unwrap stream
    Exec->>Retriever: ensure SimpleRetriever & get_path(primary_key)
    Exec->>HTTP: requester.request(method, path, headers, params, body, slice=StreamSlice.empty)
    activate HTTP
    HTTP-->>Exec: response
    deactivate HTTP

    alt no response or no records
        Exec-->>Source: AirbyteRecordNotFoundError
    else records found
        Exec->>Exec: extract via record_selector or JSON-dict fallback
        Exec-->>Source: first record (dict)
    end
    deactivate Exec

    alt Exec returned record
        Source-->>User: record
    else error
        Source-->>User: error
    end
    deactivate Source

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Focus review on:
- airbyte/_executors/declarative.py: stream unwrapping, strict SimpleRetriever checks, correct construction of request (path/headers/params/body) and use of StreamSlice.empty, and robust handling of record_selector vs JSON fallback.
- airbyte/sources/base.py: primary-key resolution/validation logic, handling of dict PK inputs and composite-PK errors, and scanning fallback/timeout semantics.
- airbyte/mcp/local_ops.py: duplicated get_source_record additions (remove one copy?), and consistent error-return format.
- tests/unit_tests/test_get_record.py: ensure mocks align with retriever/requester behavior and test edge cases (no records, unwrapped .data, non-declarative sources).

Would you like an additional targeted unit test for the JSON-dict fallback case, wdyt?

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: adding singleton record fetch and MCP tools for declarative sources, which aligns with the core features introduced across the modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/1763667280-add-get-record-without-cdk-dependency

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (7)

airbyte/exceptions.py (1)

415-421: Optional: include primary_key_value in safe logging context?

AirbyteRecordNotFoundError looks consistent with the surrounding connector errors. Would it be useful to also surface primary_key_value from this exception in PyAirbyteError.safe_logging_dict (similar to stream_name), so logs have enough context when a lookup fails, assuming the PK isn’t considered sensitive in your threat model, wdyt?

tests/unit_tests/test_get_record.py (2)

24-57: Stabilize expectations around fetch_record “happy path”?

In test_declarative_executor_fetch_record_stream_validation, the “valid_stream_and_pk” case currently asserts that fetch_record raises one of (NotImplementedError, AttributeError, KeyError) (Line [56]), i.e. any internal failure after stream lookup. As the CDK integration evolves and ModelToComponentFactory wiring stabilizes, this path may eventually succeed, which would flip this test from “green” to “red” even though behavior improved.

Would it be more future‑proof to explicitly assert only the stream‑existence contract here—for example by mocking out ModelToComponentFactory.create_component and SimpleRetriever so that the test simply verifies that no AirbyteStreamNotFoundError is raised for an existing stream, and leaves the downstream behavior to more targeted tests, wdyt?

144-149: Use typing.Any instead of bare any in type hints?

Both test_source_normalize_and_validate_pk_value and test_source_get_record_accepts_various_pk_formats annotate pk_value as any. With from __future__ import annotations, that ends up as the string "any", which type checkers typically won’t recognize as Any.

Would you consider importing Any from typing and updating these annotations to pk_value: Any for clarity and better static‑analysis support, wdyt?

Also applies to: 217-217

airbyte/_executors/declarative.py (1)

18-25: Double‑check CDK integration and edge cases in fetch_record

The high‑level flow in fetch_record looks aligned with the intended design, but a few CDK integration details and edge cases might be worth validating before relying on this in production:

Factory usage and manifest shape (Lines [192]-[207])
Here you call ModelToComponentFactory.create_component with model_type=type(retriever_config) where retriever_config is pulled directly from the manifest dict. My understanding is that CDK factories typically expect a Pydantic model instance (or specific declarative model types) as component_definition, not a raw dict. Depending on the exact CDK version/contract, this could either work by accident or fail at runtime and surface as the NotImplementedError you wrap around it.
Would it make sense to (a) confirm what create_component expects in the current CDK, and/or (b) add a small integration test using a real declarative manifest to ensure this wiring actually produces a SimpleRetriever, wdyt?

Schema/record selector wiring (Lines [262]-[273])
records_schema is currently derived from stream_config.get("schema_loader", {}), but in many declarative manifests schema_loader is a configuration for a loader rather than the resolved JSON schema. Are you sure record_selector.select_records expects the loader config here, rather than the actual JSON schema (or even None), and if not, should we either:

resolve the schema via the loader, or

explicitly pass {} and rely on selectors that don’t need schema,
wdyt?

Empty records but non‑dict JSON responses (Lines [275]-[282])
When records is empty, you only special‑case non‑empty dict JSON bodies. If a connector returns [{"id": "123", ...}] (a single‑element list) for a detail endpoint, this will currently raise AirbyteRecordNotFoundError even though the response is there. Would you consider handling the “single‑element list” case by returning that single element (and perhaps logging a warning), or is it intentional to only support dict‑shaped bodies for now, wdyt?

HTTP status / 404 handling
Since you call send_request directly, a 404 might be represented as either:

an exception from the requester, or

a normal response object with status 404 but no body/records.
If it’s the latter, would you want to explicitly check response.status_code (when available) and map a 404 to AirbyteRecordNotFoundError for clearer semantics, wdyt?

Path construction & URL encoding (Line [225])
fetch_path = f"{base_path}/{primary_key_value}".lstrip("/") works well for numeric/simple IDs, but if a primary key ever contains reserved URL characters (e.g., spaces, slashes), the request path could be malformed. Would it be safer to quote the primary_key_value (e.g., via urllib.parse.quote) before appending, or at least leave a comment/TODO here to revisit once you see real‑world connectors, wdyt?

Given your PR description already calls out that this isn’t yet exercised against real connectors, adding a small note or follow‑up task around these points could help ensure the API behaves as expected once wired into actual YAML manifests, wdyt?

Also applies to: 150-297

airbyte/sources/base.py (3)

605-621: Clarify interaction between primary‑key overrides and nested PK shapes

_get_stream_primary_key nicely flattens the “nested list” Airbyte PK structure (e.g., [[\"id\"], [\"org_id\"]] → ["id", "org_id"]). One subtle corner case is how this interacts with overrides coming from set_primary_key:

set_primary_key("stream", ["id", "org_id"]) stores ["id", "org_id"] in _primary_key_overrides.

get_configured_catalog then wraps that as [self._primary_key_overrides[...]], yielding primary_key=[[\"id\", \"org_id\"]] for the configured stream.

_get_stream_primary_key sees pk[0] as a list and flattens via [field[0] if isinstance(field, list) else field for field in pk], which would return only "id" in this specific override shape.

Would it be worth either:

clarifying in docs that set_primary_key should be used only for single‑field PKs for now, or

adjusting the flattening logic/tests to cover the override case so composite PK overrides don’t get silently truncated, wdyt?

628-675: Tighten validation of pk_value (e.g., None) before coercing to string?

_normalize_and_validate_pk_value does a good job of enforcing single‑field PKs and validating dict input against the expected field. Right now, any non‑dict value—including None or empty strings—is accepted and coerced via str(pk_value) (Line [675]), which could lead to requests like /users/None or /users/ if a caller accidentally passes a falsy/placeholder value.

Would you consider:

explicitly rejecting pk_value is None (and perhaps pk_value == "") with a PyAirbyteInputError, and

maybe including the stream name and PK field in the error message to aid debugging,

so consumers get a clear input‑validation failure instead of a surprising HTTP path, wdyt?

677-714: Confirm get_record is the only entrypoint that should call _normalize_and_validate_pk_value

get_record is currently the single public entrypoint using _normalize_and_validate_pk_value, and it correctly enforces DeclarativeExecutor before delegating. Given that _normalize_and_validate_pk_value is fairly generic, do you want to keep it private and only reachable via get_record, or would you prefer to document it (or expose a wrapper) for advanced users building their own request logic on top of Source?

If you intend it to remain strictly internal, maybe a brief comment above _normalize_and_validate_pk_value noting that it’s “used only by get_record for now” could help future maintainers avoid calling it in incompatible contexts, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2981b3d and e383e5a.

📒 Files selected for processing (4)

airbyte/_executors/declarative.py (2 hunks)
airbyte/exceptions.py (1 hunks)
airbyte/sources/base.py (2 hunks)
tests/unit_tests/test_get_record.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2024-10-08T15:34:31.026Z

Learnt from: Suraj-Vishwakarma70
Repo: airbytehq/PyAirbyte PR: 285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-10-08T15:34:31.026Z
Learning: Ensure consistent naming for attributes in the `Source` class in `airbyte/sources/base.py`, such as renaming `_to_be_selected_stream` to `_to_be_selected_streams`.

Applied to files:

airbyte/sources/base.py

🧬 Code graph analysis (4)

airbyte/exceptions.py (2)

airbyte/datasets/_sql.py (1)

stream_name (86-87)

airbyte/_batch_handles.py (1)

stream_name (47-49)

airbyte/_executors/declarative.py (1)

airbyte/exceptions.py (2)

AirbyteStreamNotFoundError (400-404)

AirbyteRecordNotFoundError (416-420)

airbyte/sources/base.py (3)

airbyte/_executors/declarative.py (2)

DeclarativeExecutor (47-297)

fetch_record (150-297)

airbyte/shared/catalog_providers.py (1)

configured_catalog (72-74)

airbyte/exceptions.py (2)

AirbyteStreamNotFoundError (400-404)

PyAirbyteInputError (201-210)

tests/unit_tests/test_get_record.py (3)

airbyte/_executors/declarative.py (2)

DeclarativeExecutor (47-297)

fetch_record (150-297)

airbyte/sources/base.py (3)

_get_stream_primary_key (605-626)

_normalize_and_validate_pk_value (628-675)

get_record (677-714)

airbyte/exceptions.py (2)

AirbyteStreamNotFoundError (400-404)

PyAirbyteInputError (201-210)

🪛 GitHub Actions: Run Linters

tests/unit_tests/test_get_record.py

[error] 1-1: Ruff formatting failed: 1 file would be reformatted. Exit code 1 from 'poetry run ruff format --diff .'

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Pytest (All, Python 3.11, Windows)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Windows)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (No Creds)

tests/unit_tests/test_get_record.py

github-actions · 2025-11-20T20:15:55Z

PyTest Results (Full)

430 tests +41 414 ✅ +41 25m 39s ⏱️ - 1m 27s
1 suites ± 0 16 💤 ± 0
1 files ± 0 0 ❌ ± 0

Results for commit ba85aaf. ± Comparison against base commit 2981b3d.

This pull request removes 1 and adds 42 tests. Note that renamed tests count towards both.

tests.integration_tests.cloud.test_custom_definitions ‑ test_safe_mode_deletion[delete:test-yaml-source-{suffix}-True]

tests.unit_tests.test_get_record ‑ test_declarative_executor_fetch_record_stream_validation[stream_not_found]
tests.unit_tests.test_get_record ‑ test_declarative_executor_fetch_record_stream_validation[valid_stream_and_pk]
tests.unit_tests.test_get_record ‑ test_source_get_record_accepts_various_pk_formats[dict_pk]
tests.unit_tests.test_get_record ‑ test_source_get_record_accepts_various_pk_formats[int_pk]
tests.unit_tests.test_get_record ‑ test_source_get_record_accepts_various_pk_formats[string_pk]
tests.unit_tests.test_get_record ‑ test_source_get_record_calls_executor_fetch_record
tests.unit_tests.test_get_record ‑ test_source_get_record_requires_declarative_executor
tests.unit_tests.test_get_record ‑ test_source_normalize_and_validate_pk_value[composite_primary_key]
tests.unit_tests.test_get_record ‑ test_source_normalize_and_validate_pk_value[dict_with_correct_key]
tests.unit_tests.test_get_record ‑ test_source_normalize_and_validate_pk_value[dict_with_multiple_entries]
…

♻️ This comment has been updated with latest results.

…ch_record() - Remove ModelToComponentFactory usage in favor of accessing existing streams - Add _unwrap_to_declarative_stream() helper to navigate concurrent wrappers - Update fetch_record() to call declarative_source.streams() for existing components - Fix unit tests to mock declarative_source property correctly - Add type ignore comments for duck-typed attribute access Co-Authored-By: AJ Steers <[email protected]>

airbyte/_executors/declarative.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

airbyte/_executors/declarative.py (1)
325-342: Tighten the JSON fallback and replace the bare except Exception: pass

The try/except Exception: pass around response.json() on Lines [327]-[331] both repeats the earlier “empty except” concern and makes it harder to notice unexpected decode failures. Would you consider:

Switching to with contextlib.suppress(Exception): around the response.json() call (as you already do for schema_loader.get_json_schema()), so the intent to ignore parse errors is explicit without an empty except; and

Optionally also handling the case where response.json() returns a non-empty list (e.g. a one-element [{...}]) by treating a single dict element as the record, to cover APIs that return arrays even for singleton lookups?

That would preserve the current not-found semantics while addressing the lint concern and slightly widening compatibility, wdyt?
-        if not records:
-            try:
-                response_json = response.json()
-                if isinstance(response_json, dict) and response_json:
-                    return response_json
-            except Exception:
-                pass
+        if not records:
+            with contextlib.suppress(Exception):
+                response_json = response.json()
+                if isinstance(response_json, dict) and response_json:
+                    return response_json
+                if (
+                    isinstance(response_json, list)
+                    and len(response_json) == 1
+                    and isinstance(response_json[0], dict)
+                ):
+                    return response_json[0]

🧹 Nitpick comments (1)

airbyte/_executors/declarative.py (1)

45-89: Consider iterating through wrapper chains in _unwrap_to_declarative_stream

Right now the helper only peeks one level through the known wrapper attributes and branch attrs (Lines [64]-[83]), so a future wrapper-of-wrapper that still ultimately exposes a retriever via the same attributes would end up raising NotImplementedError even though it could be unwrapped. Would you consider turning this into an iterative loop (tracking visited objects and repeatedly following wrapper_attrs / branch_attr until a retriever is found or a depth limit is reached) to make it more robust to additional wrapper layers while keeping the same error behavior when no retriever exists, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d28249 and eadcfb2.

📒 Files selected for processing (2)

airbyte/_executors/declarative.py (4 hunks)
tests/unit_tests/test_get_record.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit_tests/test_get_record.py

🧰 Additional context used

🧬 Code graph analysis (1)

airbyte/_executors/declarative.py (1)

airbyte/exceptions.py (2)

AirbyteStreamNotFoundError (400-404)

AirbyteRecordNotFoundError (416-420)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.11, Windows)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Windows)
GitHub Check: Pytest (No Creds)

airbyte/_executors/declarative.py

- Remove config parameter from DeclarativeExecutor.fetch_record() - Remove config argument from Source.get_record() call to fetch_record() - Executor already has full config in self._config_dict, no need to pass it Co-Authored-By: AJ Steers <[email protected]>

airbyte/_executors/declarative.py

…rameter The test was expecting the config parameter that was removed in the previous commit. Updated the assertion to match the new signature. Co-Authored-By: AJ Steers <[email protected]>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

airbyte/_executors/declarative.py (1)
321-327: Address the silent exception suppression with logging.

The empty except block at lines 326-327 silently suppresses exceptions when attempting to parse response.json(). This was flagged in previous review comments as needing an explanatory comment or logging.

While the current behavior (try to parse JSON, fall through to error if it fails) is reasonable, adding logging would help with debugging when unexpected responses occur.

Would you consider adding logging here as suggested in the previous review? For example:
+import logging
+
+logger = logging.getLogger(__name__)

# ... in fetch_record method ...

         if not records:
             try:
                 response_json = response.json()
                 if isinstance(response_json, dict) and response_json:
                     return response_json
-            except Exception:
-                pass
+            except Exception as e:
+                logger.debug(
+                    "Failed to parse response as JSON when fetching record for stream '%s', "
+                    "primary key '%s': %s",
+                    stream_name,
+                    primary_key_value,
+                    str(e),
+                )
This provides visibility into parsing failures while maintaining the existing fallthrough behavior.

🧹 Nitpick comments (5)

airbyte/sources/base.py (2)
605-626: Consider adding type validation for primary key elements, wdyt?

At line 620, when pk is a flat list (not nested), we cast it to list[str] with a type ignore comment. However, there's no explicit validation that the elements are actually strings. If the catalog contains unexpected types, this could propagate invalid data downstream.

Would it be safer to explicitly validate and convert elements to strings? For example:
                 if isinstance(pk, list) and len(pk) > 0:
                     if isinstance(pk[0], list):
                         return [field[0] if isinstance(field, list) else field for field in pk]
-                    return list(pk)  # type: ignore[arg-type]
+                    return [str(field) for field in pk]
This would ensure type safety and make the conversion explicit. Alternatively, if we trust the catalog format completely, a comment explaining why the type ignore is safe would help future maintainers.

628-675: Consider validating against None and empty primary key values, wdyt?

The method converts pk_value to a string without checking for None or empty values. If a user accidentally passes None, it would be converted to the string "None", which is probably not the intended primary key value. Similarly, empty strings or whitespace-only strings might warrant validation.

Would it make sense to add validation like this after line 675?
         return str(pk_value[provided_key])
 
     return str(pk_value)
+
+def _normalize_and_validate_pk_value(
+    self,
+    stream_name: str,
+    pk_value: Any,  # noqa: ANN401
+) -> str:
+    """Normalize and validate a primary key value."""
+    # ... existing logic ...
+    
+    result = str(pk_value)  # or the dict extraction above
+    
+    if not result or result == "None":
+        raise exc.PyAirbyteInputError(
+            message=f"Primary key value cannot be None or empty for stream '{stream_name}'.",
+            input_value=str(pk_value),
+        )
+    
+    return result
This would catch common user errors early with a clear message.
airbyte/_executors/declarative.py (3)
45-89: Consider validating that retriever is not None, wdyt?

The unwrapping logic checks hasattr(unwrapped, "retriever") at lines 76 and 82, but doesn't verify that the retriever attribute itself is not None. If a stream wrapper has a retriever attribute set to None, this function would return the wrapper, and the caller at line 249 would attempt to access it without a null check.

Would it be safer to add a not-None check? For example:
         if hasattr(stream, attr_name):
             unwrapped = getattr(stream, attr_name)
-            if unwrapped is not None and hasattr(unwrapped, "retriever"):
+            if unwrapped is not None and hasattr(unwrapped, "retriever") and unwrapped.retriever is not None:
                 return unwrapped
Similarly for the branch attributes at lines 81-83. This would make the function more robust against unexpected CDK states.

257-267: Consider adding validation for path construction edge cases, wdyt?

The path construction at lines 264-267 assumes that base_path is well-formed and simply appends the primary key value. However, there are potential edge cases:

What if base_path ends with multiple slashes (e.g., "/users//")?

What if base_path already contains a path segment that looks like a placeholder?

What if primary_key_value contains URL-unsafe characters?

Would it be worth adding some defensive checks or URL encoding? For example:
from urllib.parse import quote

# After line 262
if base_path:
    # Normalize the base path and encode the PK value
    normalized_base = base_path.rstrip('/')
    encoded_pk = quote(str(primary_key_value), safe='')
    fetch_path = f"{normalized_base}/{encoded_pk}"
else:
    fetch_path = quote(str(primary_key_value), safe='')
This would handle special characters in PK values (like spaces, slashes, etc.) more robustly. Though if this isn't a concern for the initial implementation, it could be deferred.

195-343: Consider breaking down fetch_record into smaller helper methods, wdyt?

The fetch_record method is quite long (149 lines) and handles multiple responsibilities: stream lookup, retriever validation, path construction, HTTP request, schema retrieval, record selection, and response parsing. This complexity is reflected in the linter suppressions (PLR0914, PLR0912, PLR0915).

While the current implementation is functional, would it be beneficial to extract some of these concerns into smaller, testable helper methods? For example:
def _find_stream(self, stream_name: str, streams: list) -> object:
    """Find and return the target stream by name."""
    # Lines 220-245
    
def _build_fetch_path(self, retriever: SimpleRetriever, primary_key_value: str) -> str:
    """Construct the fetch path by appending PK to base path."""
    # Lines 257-267
    
def _send_fetch_request(self, retriever: SimpleRetriever, fetch_path: str, ...) -> Response:
    """Send the HTTP request to fetch the record."""
    # Lines 269-290
    
def _extract_record_from_response(self, response, retriever, stream_name, primary_key_value, ...) -> dict:
    """Extract and return the record from the HTTP response."""
    # Lines 304-343
This would make the code more maintainable, easier to test individual components, and reduce cognitive complexity. However, if you prefer to keep it as-is for the initial implementation, that's also reasonable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eadcfb2 and 17f6c9e.

📒 Files selected for processing (2)

airbyte/_executors/declarative.py (4 hunks)
airbyte/sources/base.py (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2024-10-08T15:34:31.026Z

Learnt from: Suraj-Vishwakarma70
Repo: airbytehq/PyAirbyte PR: 285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-10-08T15:34:31.026Z
Learning: Ensure consistent naming for attributes in the `Source` class in `airbyte/sources/base.py`, such as renaming `_to_be_selected_stream` to `_to_be_selected_streams`.

Applied to files:

airbyte/sources/base.py

🧬 Code graph analysis (2)

airbyte/sources/base.py (3)

airbyte/_executors/declarative.py (2)

DeclarativeExecutor (92-343)

fetch_record (195-343)

airbyte/shared/catalog_providers.py (1)

configured_catalog (72-74)

airbyte/exceptions.py (2)

AirbyteStreamNotFoundError (400-404)

PyAirbyteInputError (201-210)

airbyte/_executors/declarative.py (2)

airbyte/_connector_base.py (1)

name (84-86)

airbyte/exceptions.py (2)

AirbyteStreamNotFoundError (400-404)

AirbyteRecordNotFoundError (416-420)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Windows)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (All, Python 3.11, Windows)

🔇 Additional comments (2)

airbyte/sources/base.py (1)

33-33: LGTM!

The import is necessary for the get_record implementation and is properly placed.

airbyte/_executors/declarative.py (1)

6-6: LGTM!

The new imports are necessary for the fetch_record implementation and are used appropriately throughout the code.

Also applies to: 19-20, 22-22

airbyte/sources/base.py

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/unit_tests/test_get_record.py (1)

17-90: Consider adding tests for additional DeclarativeExecutor edge cases?

Looking at the implementation in airbyte/_executors/declarative.py, a few edge cases could strengthen coverage:

Non-SimpleRetriever case: When a stream uses a different retriever type (should raise NotImplementedError)

Response is None: When send_request() returns None (should raise AirbyteRecordNotFoundError)

Empty records with fallback: When select_records() returns empty but response.json() has data

Stream unwrapping: When streams are wrapped and need unwrapping to find the declarative stream

These would catch regressions in the error-handling paths. Given that the PR notes this hasn't been tested with real connectors yet, these additional tests might catch issues before integration testing, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17f6c9e and 42bfd1f.

📒 Files selected for processing (1)

tests/unit_tests/test_get_record.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/unit_tests/test_get_record.py (2)

airbyte/_executors/declarative.py (2)

DeclarativeExecutor (92-343)

fetch_record (195-343)

airbyte/sources/base.py (3)

_get_stream_primary_key (605-626)

_normalize_and_validate_pk_value (628-675)

get_record (677-713)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Pytest (All, Python 3.10, Windows)
GitHub Check: Pytest (All, Python 3.11, Windows)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (No Creds)
GitHub Check: Pytest (Fast)

🔇 Additional comments (6)

tests/unit_tests/test_get_record.py (6)

1-16: LGTM!

The imports and module setup are clean and appropriate for the test suite.

17-90: LGTM!

The stream validation test covers both the success case and the stream-not-found error path effectively. The extensive mocking is appropriate for isolating the unit under test.

92-127: LGTM!

Excellent coverage of various primary key formats. The test thoroughly validates the PK extraction and normalization logic.

129-148: LGTM!

Good error case coverage for nonexistent streams.

205-220: LGTM!

Properly validates that get_record() is restricted to declarative sources.

222-242: LGTM!

The test effectively verifies that get_record() correctly delegates to executor.fetch_record() with proper parameter passing.

tests/unit_tests/test_get_record.py

- Remove _unwrap_to_declarative_stream() function entirely - Validate streams directly with isinstance(stream, AbstractStream) - Replace getattr() with direct attribute access after validation - Use guard statements instead of graceful error handling - Remove contextlib.suppress() around schema loading - Remove JSON fallback when select_records() returns empty - Update test to use Mock(spec=AbstractStream) for proper type checking This addresses review feedback to use strong typing and fail-fast error handling instead of duck-typing and graceful degradation. Co-Authored-By: AJ Steers <[email protected]>

airbyte/_executors/declarative.py

airbyte/sources/base.py

Address review feedback to validate stream name match first, then check type compatibility. This ensures we raise NotImplementedError for found streams of incompatible types rather than silently skipping them. - Match stream by name first - Validate AbstractStream type after name match - Raise NotImplementedError with clear message for incompatible types - Remove isinstance() filter from available_streams list Co-Authored-By: AJ Steers <[email protected]>

airbyte/sources/base.py

@aaronsteers

…ack to get_record() - Refactor _get_stream_primary_key() to use CatalogProvider.get_primary_keys() - Handle both flat and nested primary key formats - Add allow_scanning and scan_timeout_seconds parameters to get_record() - Implement scanning fallback when direct fetch fails or for non-declarative sources - Normalize dict keys with LowerCaseNormalizer for case-insensitive comparison - Defer PK validation to avoid early catalog access on non-declarative executors Addresses GitHub review comments from @aaronsteers on PR #872 Co-Authored-By: AJ Steers <[email protected]>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

airbyte/sources/base.py (2)
606-638: Consider simplifying by using CatalogProvider directly, wdyt?

The current implementation creates a temporary catalog (lines 623-630) just to call CatalogProvider.get_primary_keys(). Since self.configured_catalog already has properly formatted primary keys, could we simplify this to:
def _get_stream_primary_key(self, stream_name: str) -> list[str]:
    """Get the primary key for a stream."""
    catalog_provider = CatalogProvider(self.configured_catalog)
    return catalog_provider.get_primary_keys(stream_name)
This would:

Eliminate the manual catalog iteration and temporary object creation

Reuse the validation logic already in CatalogProvider

Let CatalogProvider.get_configured_stream_info() handle the stream-not-found case consistently

The defensive check for flat vs nested PK format (lines 620-621) shouldn't be needed since get_configured_catalog() (lines 452-456) already ensures primary keys are in the correct nested format.

719-740: Minor optimization: pk_value_str computed twice in fallback path

When a DeclarativeExecutor falls back to scanning (lines 726-729), pk_value_str is computed twice:

Line 720: For the direct fetch attempt

Line 740: For the scanning fallback

You could optimize by computing once before line 719:
+        pk_value_str = self._normalize_and_validate_pk_value(stream_name, pk_value)
         if isinstance(self.executor, DeclarativeExecutor):
-            pk_value_str = self._normalize_and_validate_pk_value(stream_name, pk_value)
             try:
                 return self.executor.fetch_record(
Then remove line 740 since pk_value_str is already available. This avoids redundant validation when falling back to scanning, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8028397 and a2c4b42.

📒 Files selected for processing (1)

airbyte/sources/base.py (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2024-10-08T15:34:31.026Z

Learnt from: Suraj-Vishwakarma70
Repo: airbytehq/PyAirbyte PR: 285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-10-08T15:34:31.026Z
Learning: Ensure consistent naming for attributes in the `Source` class in `airbyte/sources/base.py`, such as renaming `_to_be_selected_stream` to `_to_be_selected_streams`.

Applied to files:

airbyte/sources/base.py

🧬 Code graph analysis (1)

airbyte/sources/base.py (4)

airbyte/_executors/declarative.py (2)

DeclarativeExecutor (45-282)

fetch_record (148-282)

airbyte/_util/name_normalizers.py (3)

LowerCaseNormalizer (48-87)

normalize (23-25)

normalize (53-87)

airbyte/shared/catalog_providers.py (3)

configured_catalog (72-74)

CatalogProvider (31-226)

get_primary_keys (146-172)

airbyte/exceptions.py (3)

AirbyteStreamNotFoundError (400-404)

PyAirbyteInputError (201-210)

AirbyteRecordNotFoundError (416-420)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Pytest (All, Python 3.10, Windows)
GitHub Check: Pytest (All, Python 3.11, Windows)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (No Creds)
GitHub Check: Pytest (Fast)

airbyte/sources/base.py

- Add 'from typing import Any' import - Fix type annotation on line 183: pk_value: any -> pk_value: Any - Fix type annotation on line 254: pk_value: any -> pk_value: Any Addresses GitHub review comment #2547624128 Co-Authored-By: AJ Steers <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

tests/unit_tests/test_get_record.py (4)

19-92: Stream validation test is solid; consider also asserting on the constructed fetch path?

The parametrized coverage for existing vs missing streams looks good and the mocking of declarative_source.streams / SimpleRetriever aligns with the implementation. Would it be worth adding an assertion that send_request is called with the expected path (including the /primary_key_value suffix) so regressions in path construction are caught here as well, wdyt?

94-150: _get_stream_primary_key test currently leans on CatalogProvider internals

These tests indirectly depend on CatalogProvider behavior via _get_stream_primary_key, which is nice for integration coverage but could become brittle if CDK/CatalogProvider internals change. Would you consider patching CatalogProvider.get_primary_keys (or injecting a small fake) so this test focuses purely on how Source wires primary_key shapes into the provider, while still leaving a separate integration test for the full stack, wdyt?

207-222: Non-declarative executor test could assert on the exact error message

The test already uses match="only supported for declarative sources", which is helpful. If the error message ever grows, would you want to tighten this a bit more (e.g., anchor with ^/$ or assert on a dedicated error subclass) so refactors don’t accidentally weaken the contract, or is the current level of coupling intentional, wdyt?

224-270: Executor mocks may not exercise the same isinstance path as production

In get_record tests you use Mock(spec=DeclarativeExecutor) as the executor. Given that Source.get_record currently branches on isinstance(self.executor, DeclarativeExecutor) (per airbyte/sources/base.py), a plain Mock won’t satisfy that isinstance check, so these tests might not be hitting the same code path as a real DeclarativeExecutor instance. Would it be safer to use a lightweight DeclarativeExecutor subclass test double, or create_autospec(DeclarativeExecutor, instance=True), so the isinstance branch is exercised and any future changes to the type check are properly covered, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2c4b42 and 8cdfb84.

📒 Files selected for processing (1)

tests/unit_tests/test_get_record.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/unit_tests/test_get_record.py (4)

airbyte/_executors/declarative.py (2)

DeclarativeExecutor (45-282)

fetch_record (148-282)

airbyte/sources/base.py (4)

Source (69-1138)

_get_stream_primary_key (606-638)

_normalize_and_validate_pk_value (640-688)

get_record (690-768)

airbyte/exceptions.py (2)

AirbyteStreamNotFoundError (400-404)

PyAirbyteInputError (201-210)

airbyte/_executors/python.py (1)

VenvExecutor (27-344)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.11, Windows)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Windows)
GitHub Check: Pytest (No Creds)
GitHub Check: Pytest (Fast)

🔇 Additional comments (1)

tests/unit_tests/test_get_record.py (1)

152-205: Great coverage of PK normalization edge cases

The parametrization here does a nice job covering strings, ints, dicts, composite keys, and missing PK definitions, and the expectations match the behavior described in _normalize_and_validate_pk_value. I don’t see any obvious gaps in the input space given the current implementation.

- Add catalog_provider property to Source class that returns CatalogProvider instance - Remove _get_stream_primary_key() helper method (lines 606-638) - Replace both usages with self.catalog_provider.get_primary_keys() - Update test to patch catalog_provider property instead of private method - Remove tests for deleted private method (test_source_get_stream_primary_key) This simplifies the code by using the existing CatalogProvider utility directly instead of maintaining duplicate primary key extraction logic. Co-Authored-By: AJ Steers <[email protected]>

aaronsteers · 2025-12-05T22:11:52Z

Devin, add MCP tools for the new capabilities. Specifically "get source record" and maybe others.

devin-ai-integration · 2025-12-05T22:17:52Z

Thanks AJ! I'll add an MCP tool called get_source_record to expose the new get_record() functionality through the MCP interface. This will make it accessible to AI agents and other MCP clients.

The tool will follow the same pattern as read_source_stream_records but fetch a single record by primary key value instead of reading multiple records from a stream.

Implementing now.

Co-Authored-By: AJ Steers <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

airbyte/sources/base.py (1)
711-729: Critical: Field name normalization mismatch in scanning logic.

This is the bug flagged in the previous review and it's still present. Here's what's happening:

Line 712: pk_field comes from catalog_provider.get_primary_keys(), which returns normalized field names (e.g., "id" for a source field "ID"), as confirmed by the relevant code snippet showing LowerCaseNormalizer.normalize() is applied.

Line 716: get_records(stream_name) uses the default normalize_field_names=False, so records contain original field names from the source (e.g., {"ID": "123", ...}).

Line 729: record_data.get(pk_field) looks up the normalized name "id" in a record that has the original field name "ID", which returns None and causes the match to fail.

Example that breaks:

Source defines primary key: "ID" (uppercase)

After catalog normalization: pk_field = "id" (lowercase)

Record from source: {"ID": "123", ...} (original casing preserved)

Lookup: record_data.get("id") → None ❌

Match fails even though the record exists

The fix is straightforward—pass normalize_field_names=True to ensure records use normalized field names:
-        for record in self.get_records(stream_name):
+        for record in self.get_records(stream_name, normalize_field_names=True):
Would you also want to add a unit test that covers a source with non-lowercase primary key field names to prevent regressions, wdyt?

🧹 Nitpick comments (2)

airbyte/sources/base.py (2)

422-425: Consider caching the catalog_provider to avoid repeated instantiation.

The property creates a new CatalogProvider instance on each access. Looking at get_record(), it calls this property twice indirectly (via _normalize_and_validate_pk_value at line 691 and directly at line 712), creating two instances per call.

While CatalogProvider initialization is likely lightweight, would it make sense to cache the instance—perhaps as self._catalog_provider—and only recreate it when configured_catalog changes? Or is the instantiation cost negligible enough that this optimization isn't worth the added complexity, wdyt?

690-713: Optional: Avoid duplicate PK normalization in scanning path.

The code calls _normalize_and_validate_pk_value() twice when entering the scanning fallback:

Line 691: First call (before attempting declarative fetch)

Line 711: Second call (when entering scan fallback)

Since pk_value_str is already computed at line 691, you could reuse it at line 711 instead of re-normalizing. Similarly, primary_key_fields could be retrieved once.

Would it be worth caching these values to avoid the redundant work, or is the simplicity of the current structure preferred? Wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cdfb84 and ba85aaf.

📒 Files selected for processing (3)

airbyte/mcp/local_ops.py (1 hunks)
airbyte/sources/base.py (3 hunks)
tests/unit_tests/test_get_record.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit_tests/test_get_record.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2024-10-08T15:34:31.026Z

Learnt from: Suraj-Vishwakarma70
Repo: airbytehq/PyAirbyte PR: 285
File: airbyte/sources/base.py:0-0
Timestamp: 2024-10-08T15:34:31.026Z
Learning: Ensure consistent naming for attributes in the `Source` class in `airbyte/sources/base.py`, such as renaming `_to_be_selected_stream` to `_to_be_selected_streams`.

Applied to files:

airbyte/sources/base.py

🧬 Code graph analysis (2)

airbyte/mcp/local_ops.py (3)

airbyte/mcp/_tool_utils.py (1)

mcp_tool (102-148)

airbyte/sources/base.py (2)

config_spec (359-368)

get_record (661-739)

airbyte/mcp/_util.py (1)

resolve_config (129-226)

airbyte/sources/base.py (4)

airbyte/_executors/declarative.py (1)

fetch_record (148-282)

airbyte/_util/name_normalizers.py (3)

LowerCaseNormalizer (48-87)

normalize (23-25)

normalize (53-87)

airbyte/shared/catalog_providers.py (3)

CatalogProvider (31-226)

configured_catalog (72-74)

get_primary_keys (146-172)

airbyte/exceptions.py (2)

PyAirbyteInputError (201-210)

AirbyteRecordNotFoundError (416-420)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Pytest (All, Python 3.11, Windows)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Windows)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (No Creds)
GitHub Check: Pytest (Fast)

🔇 Additional comments (3)

airbyte/mcp/local_ops.py (2)

464-469: Question: Is idempotent=True accurate for record fetching?

The decorator marks this tool as idempotent=True, which typically means repeated calls with the same inputs produce the same result. However, since the underlying data source could change between calls (records can be updated or deleted), the same pk_value might return different data or even fail on subsequent calls.

Would it make more sense to omit the idempotent flag or set it to False? Or is the idempotency claim intended at the "operation level" (i.e., the operation itself doesn't modify state), regardless of whether the returned data changes? Wdyt?

547-580: LGTM! Error handling pattern is appropriate.

The implementation correctly wraps source.get_record() with proper configuration resolution and error handling. The pattern of returning an error string instead of raising exceptions matches the existing MCP tool conventions in this file (e.g., read_source_stream_records), which is appropriate for MCP tools that need graceful degradation.

The stderr logging at line 568-571 is also consistent with similar tools in this file.

airbyte/sources/base.py (1)

611-659: LGTM! PK validation logic is sound.

The method correctly handles the three input forms (string, int, dict) and properly validates composite PKs. The normalization logic at lines 647-648 is correct: both the user-provided key and the catalog's pk_field are normalized before comparison, ensuring case-insensitive matching.

The NotImplementedError for composite keys (lines 632-636) is appropriate given the PR objectives note this limitation.

devin-ai-integration bot assigned aaronsteers Nov 20, 2025

github-code-quality bot found potential problems Nov 20, 2025

View reviewed changes

airbyte/_executors/declarative.py Fixed Show fixed Hide fixed

style: Fix ruff formatting in test_get_record.py

1d28249

Co-Authored-By: AJ Steers <[email protected]>

coderabbitai bot requested changes Nov 20, 2025

View reviewed changes

tests/unit_tests/test_get_record.py Show resolved Hide resolved

coderabbitai bot approved these changes Nov 20, 2025

View reviewed changes

github-code-quality bot found potential problems Nov 20, 2025

View reviewed changes

airbyte/_executors/declarative.py Fixed Show fixed Hide fixed

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

aaronsteers commented Nov 20, 2025

View reviewed changes

airbyte/_executors/declarative.py Outdated Show resolved Hide resolved

github-code-quality bot found potential problems Nov 20, 2025

View reviewed changes

airbyte/_executors/declarative.py Fixed Show fixed Hide fixed

test: Update test to match fetch_record() signature without config pa…

42bfd1f

…rameter The test was expecting the config parameter that was removed in the previous commit. Updated the assertion to match the new signature. Co-Authored-By: AJ Steers <[email protected]>

coderabbitai bot requested changes Nov 20, 2025

View reviewed changes

airbyte/sources/base.py Show resolved Hide resolved

coderabbitai bot requested changes Nov 20, 2025

View reviewed changes

tests/unit_tests/test_get_record.py Outdated Show resolved Hide resolved

tests/unit_tests/test_get_record.py Outdated Show resolved Hide resolved

aaronsteers commented Nov 20, 2025

View reviewed changes

airbyte/_executors/declarative.py Show resolved Hide resolved

aaronsteers commented Nov 20, 2025

View reviewed changes

airbyte/sources/base.py Outdated Show resolved Hide resolved

aaronsteers commented Nov 20, 2025

View reviewed changes

airbyte/sources/base.py Show resolved Hide resolved

coderabbitai bot requested changes Nov 20, 2025

View reviewed changes

airbyte/sources/base.py Show resolved Hide resolved

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

coderabbitai bot approved these changes Nov 20, 2025

View reviewed changes

aaronsteers marked this pull request as draft November 27, 2025 03:20

feat: Add get_source_record MCP tool for singleton record fetching

ba85aaf

Co-Authored-By: AJ Steers <[email protected]>

devin-ai-integration bot changed the title ~~feat: Add singleton record fetch functionality without CDK dependency~~ feat: Add singleton record fetch and MCP tool for declarative sources Dec 5, 2025

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

feat: Add singleton record fetch and MCP tool for declarative sources #872

Are you sure you want to change the base?

feat: Add singleton record fetch and MCP tool for declarative sources #872

Conversation

aaronsteers commented Nov 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat: Add singleton record fetch and MCP tool for declarative sources

Summary

Review & Testing Checklist for Human

Test Plan Recommendation

Notes

Summary by CodeRabbit

Uh oh!

devin-ai-integration bot commented Nov 20, 2025

Uh oh!

devin-ai-integration bot commented Nov 20, 2025

🤖 Devin AI Engineer

Uh oh!

github-actions bot commented Nov 20, 2025

👋 Greetings, Airbyte Team Member!

Testing This PyAirbyte Version

Helpful Resources

PR Slash Commands

Community Support

Uh oh!

Uh oh!

github-actions bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Fast Tests Only, No Creds)

Uh oh!

coderabbitai bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Full)

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

aaronsteers commented Dec 5, 2025

Uh oh!

devin-ai-integration bot commented Dec 5, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

aaronsteers commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

github-actions bot commented Nov 20, 2025 •

edited

Loading

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

github-actions bot commented Nov 20, 2025 •

edited

Loading